FDR-Corrected Sparse Canonical Correlation Analysis with Applications to Imaging Genomics

نویسندگان

  • Alexej Gossmann
  • Pascal Zille
  • Vince Calhoun
  • Yu-Ping Wang
چکیده

Reducing the number of false positive discoveries is presently one of the most pressing issues in the life sciences. It is of especially great importance for many applications in neuroimaging and genomics, where datasets are typically high-dimensional, which means that the number of explanatory variables exceeds the sample size. The false discovery rate (FDR) is a criterion that can be employed to address that issue. Thus it has gained great popularity as a tool for testing multiple hypotheses. Canonical correlation analysis (CCA) is a statistical technique that is used to make sense of the cross-correlation of two sets of measurements collected on the same set of samples (e.g., brain imaging and genomic data for the same mental illness patients), and sparse CCA extends the classical method to high-dimensional settings. Here we propose a way of applying the FDR concept to sparse CCA, and a method to control the FDR. The proposed FDR correction directly influences the sparsity of the solution, adapting it to the unknown true sparsity level. Theoretical derivation as well as simulation studies show that our procedure indeed keeps the FDR of the canonical vectors below a user-specified target level. We apply the proposed method to an imaging genomics dataset from the Philadelphia Neurodevelopmental Cohort. Our results link the brain activity during a working memory task, as measured by functional magnetic resonance imaging (fMRI), to the corresponding subjects’ genomic data. Our findings are supported by previous work on cognitive ability, neurodevelopmental, and other mental disorders.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correlating Cellular Features with Gene Expression using CCA

To understand the biology of cancer, joint analysis of multiple data modalities, including imaging and genomics, is crucial. We propose the use of canonical correlation analysis (CCA) and a sparse variant as a preliminary discovery tool for identifying connections across modalities, specifically between gene expression and features describing cell and nucleus shape, texture, and stain intensity...

متن کامل

Sparse CCA: Adaptive Estimation and Computational Barriers

Canonical correlation analysis (CCA) is a classical and important multivariate technique for exploring the relationship between two sets of variables. It has applications in many fields including genomics and imaging, to extract meaningful features as well as to use the features for subsequent analysis. This paper considers adaptive and computationally tractable estimation of leading sparse can...

متن کامل

FDR made easy in differential feature discovery and correlation analyses

SUMMARY Rapid progress in technology, particularly in high-throughput biology, allows the analysis of thousands of genes or proteins simultaneously, where the multiple comparison problems occurs. Global false discovery rate (gFDR) analysis statistically controls this error, computing the ratio of the number of false positives over the total number of rejections. Local FDR (lFDR) method can asso...

متن کامل

Dementia induces correlated reductions in white matter integrity and cortical thickness: A multivariate neuroimaging study with sparse canonical correlation analysis

We use a new, unsupervised multivariate imaging and analysis strategy to identify related patterns of reduced white matter integrity, measured with the fractional anisotropy (FA) derived from diffusion tensor imaging (DTI), and decreases in cortical thickness, measured by high resolution T1-weighted imaging, in Alzheimer's disease (AD) and frontotemporal dementia (FTD). This process is based on...

متن کامل

Minimax Estimation in Sparse Canonical Correlation Analysis

Canonical correlation analysis is a widely used multivariate statistical technique for exploring the relation between two sets of variables. This paper considers the problem of estimating the leading canonical correlation directions in high dimensional settings. Recently, under the assumption that the leading canonical correlation directions are sparse, various procedures have been proposed for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017